Keyphrase Extraction for Technical Language Processing

نویسندگان

چکیده

Keyphrase extraction is an important facet of annotation tools that offer the provision metadata necessary for technical language processing (TLP). Because TLP imposes additional requirements on typical natural (NLP) methods, we examined keyphrase through lens a hypothetical toolkit which consists combination text features and classifiers suitable use in low-resource applications. We compared two approaches extraction: The first applied our toolkit-based methods used only distributional words phrases, second was Maui automatic topic indexer, well-known academic method. Performance measured against collections literature: 1153 articles from Journal Chemical Thermodynamics (JCT) curated by National Institute Standards Technology Research Center (TRC) 244 Task 5 Workshop Semantic Evaluation (SemEval). Both have author-provided keyphrases available; SemEval also reader-provided keyphrases. Our findings indicate approach competitive with when were removed text. For TRC-JCT articles, indexer reported F -measure 29.4 % while obtained 28.2 %. using Naïve Bayes classifier resulted 20.8 %, outperformed Maui’s 18.8

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Likey: Unsupervised Language-Independent Keyphrase Extraction

Likey is an unsupervised statistical approach for keyphrase extraction. The method is language-independent and the only language-dependent component is the reference corpus with which the documents to be analyzed are compared. In this study, we have also used another language-dependent component: an English-specific Porter stemmer as a preprocessing step. In our experiments of keyphrase extract...

متن کامل

A Language Model Approach To Keyphrase Extraction

We present a new approach to extracting keyphrases based on statistical language models. Our approach is to use pointwise KL-divergence between multiple language models for scoring both phraseness and informativeness, which can be unified into a single score to rank extracted phrases.

متن کامل

Single-Document Keyphrase Extraction for Multi-Document Keyphrase Extraction

Here, we address the task of assigning relevant terms to thematically and semantically related sub-corpora and achieve superior results compared to the baseline performance. Our results suggest that more reliable sets of keyphrases can be assigned to the semantically and thematically related subsets of some corpora if the automatically determined sets of keyphrases for the individual documents ...

متن کامل

Ranking Techniques for Keyphrase Extraction

This thesis focuses on the task of extracting keyphrases from research papers. Keyphrases are short phrases that summarize and characterize the contents of documents. They help users explore sets of documents and quickly understand the contents of individual documents. Most academic papers do not have keyphrases assigned to them, and manual keyphrase assignment is highly laborious. As such, the...

متن کامل

A Language-Independent Approach to Keyphrase Extraction and Evaluation

We present Likey, a language-independent keyphrase extraction method based on statistical analysis and the use of a reference corpus. Likey has a very light-weight preprocessing phase and no parameters to be tuned. Thus, it is not restricted to any single language or language family. We test Likey having exactly the same configuration with 11 European languages. Furthermore, we present an autom...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Research of the National Institute of Standards and Technology

سال: 2022

ISSN: ['2165-7254', '1044-677X']

DOI: https://doi.org/10.6028/jres.126.053